Search CORE

25 research outputs found

WavSpA: Wavelet Space Attention for Boosting Transformers' Long Sequence Learning Ability

Author: Shang Jingbo
Tao Fangbo
Wang Zihan
Zhuang Yufan
Publication venue
Publication date: 22/05/2023
Field of study

Transformer and its variants are fundamental neural architectures in deep learning. Recent works show that learning attention in the Fourier space can improve the long sequence learning capability of Transformers. We argue that wavelet transform shall be a better choice because it captures both position and frequency information with linear time complexity. Therefore, in this paper, we systematically study the synergy between wavelet transform and Transformers. We propose Wavelet Space Attention (WavSpA) that facilitates attention learning in a learnable wavelet coefficient space which replaces the attention in Transformers by (1) applying forward wavelet transform to project the input sequences to multi-resolution bases, (2) conducting attention learning in the wavelet coefficient space, and (3) reconstructing the representation in input space via backward wavelet transform. Extensive experiments on the Long Range Arena demonstrate that learning attention in the wavelet space using either fixed or adaptive wavelets can consistently improve Transformer's performance and also significantly outperform learning in Fourier space. We further show our method can enhance Transformer's reasoning extrapolation capability over distance on the LEGO chain-of-reasoning task

arXiv.org e-Print Archive

A Computational Drug-Target Network for Yuanhu Zhitong Prescription

Author: Fangbo Zhang
Haiyu Xu
Hongjun Yang
Luqi Huang
Peng Lu
Peng Wang
Songsong Wang
Xuefeng Xiao
Ye Tao
Yuan Yuan
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

Yuanhu Zhitong prescription (YZP) is a typical and relatively simple traditional Chinese medicine (TCM), widely used in the clinical treatment of headache, gastralgia, and dysmenorrhea. However, the underlying molecular mechanism of action of YZP is not clear. In this study, based on the previous chemical and metabolite analysis, a complex approach including the prediction of the structure of metabolite, high-throughput in silico screening, and network reconstruction and analysis was developed to obtain a computational drug-target network for YZP. This was followed by a functional and pathway analysis by ClueGO to determine some of the pharmacologic activities. Further, two new pharmacologic actions, antidepressant and antianxiety, of YZP were validated by animal experiments using zebrafish and mice models. The forced swimming test and the tail suspension test demonstrated that YZP at the doses of 4 mg/kg and 8 mg/kg had better antidepressive activity when compared with the control group. The anxiolytic activity experiment showed that YZP at the doses of 100 mg/L, 150 mg/L, and 200 mg/L had significant decrease in diving compared to controls. These results not only shed light on the better understanding of the molecular mechanisms of YZP for curing diseases, but also provide some evidence for exploring the classic TCM formulas for new clinical application

Crossref

Directory of Open Access Journals

Text cube: construction, summarization and mining

Author: Tao Fangbo
Publication venue
Publication date
Field of study

A large portion of real world data is either text or structured (\eg, relational) data. Such data objects are often linked together (\eg, structured product information linking with their descriptions and customer reviews.). To systematically analyze large numbers of such textual documents, it is often desirable to manage the text data with the associated structured data in a multi-dimensional space (hence \emph{text cube}). This thesis studies the multi-dimensional representation of large textual data. Since Jim Gray introduced the concept of ``data cube'', data cube, associated with online analytical processing (OLAP), has become a driving engine in data warehouse industry. By modeling a large textual corpus as a ``cube'', \ie multi-dimensional and hierarchical structure, we bridge the power of traditional OLAP and Information Retrieval / Natural Language Processing techniques. In particular, this thesis focuses on two lines of work, one is to construct a multi-dimensional text cube from raw text data with limited user guidance; the other is to develop effective summarization and mining techniques tailored for multi-dimensional queries on text cubes. In the first part of the thesis, the problem of \emph{dimension-based structure creation} is studied. We propose an end-to-end framework for extracting multi-dimensional structure from a corpus, taking the input of a corpus of specific domain and limited seeds to generate a high-quality dimension values as output. We introduce the novel concept of Semantic Pattern Graph to leverage web signals to understand the underlying semantics of lexical patterns, improve pattern evaluation using mined semantics, and yield more accurate and complete structure. Experiments show the effectiveness of our approach. In the second part, with all the dimensions discovered, we study the problem of \emph{cell-based document allocation}. That is, linking the created dimensions with text data and construct a multi-dimensional text cube. To allocate documents into correct multi-dimensional subsets, \ie a cell. Traditional approaches, in this particular task, may require substantial labeling from user. Instead, we propose a model that requires no additional training data besides the given (label) name of each cube dimension as weak supervision. With such weak supervision, we develop a \emph{dimension-aware joint embedding} framework that learns joint representations for terms, documents, and labels. In the joint embedding process, our method iteratively learns dimension-aware document representations by selectively focusing on discriminative keywords for different dimensions. Furthermore, it alleviates label sparsity by leveraging label representations to enrich the labeled term set. Numerical experiments corroborate the effectiveness of our solution. In the third part, we introduce the concept of \emph{Context-Aware Semantic Online Analytical Processing} (\ie \emph{CASeOLAP}) in text cubes, and use \emph{top-

k

representative phrases} to represent the semantics of the document subset in a text cube cell. By ranking phrases with a newly proposed ranking measure according to three criteria: integrity, popularity and distinctiveness. We identify phrases that can successfully digest the main content of a subset of documents of interest and contrast with other neighboring subsets. Our experiments in a large news dataset demonstrate the effectiveness of the newly proposed ranking measure in finding representative phrases and the efficiency in both query processing time and storage cost. The approach is also applied to clinical biomarker analysis and protest news analysis with success. In the last part, the system of \emph{EventCube} is proposed to support end-to-end pipeline of text cube in an informative, interactive, and user-friendly manner. The system serves as a general platform for construction, search, summarization, OLAP (online analytical processing) and data mining on integrated text and structured data. The system is a growing testbed for various text cube based research and has been successfully applied to NASA for aviation safety report analysis and Army Research Lab for Counter-Terrorism Report analysis. To summarize, this thesis provides important results of construction and consumption of multi-dimensional text cubes and shows its power in tackling real-world text analysis tasks

Comparison of Short-Duration and Long-Duration Rice Cultivars Cultivated in Various Planting Densities

Author: Fangbo Cao
Jiana Chen
Min Huang
Salah Fatouh Abou-Elwafa
Tao Lei
Yu Liu
Zui Tao
Publication venue: 'MDPI AG'
Publication date: 23/07/2022
Field of study

The selection of high-yielding, short-duration rice cultivars is essential for the double-rice cropping system. High hill density could be achieved with less labor under machine transplanted conditions. Therefore, dense planting is more practical for machine-transplanted rice. While few studies have been conducted to certify the feasibility of short-duration cultivar combined with dense planting in machine transplanted conditions. The current study was executed to determine the effects of dense planting on yield attributes and grain yield of short-duration cultivars under mechanically transplanted conditions. A field experiment comprises two treatments—i.e., the short-duration cultivar Lingliangyou104 cultivated at a high planting density (SDH) and the long-duration Taiyou390 cultivar cultivated at a low planting density (LDL)—were conducted in the 2018 and 2019 late seasons. The results showed that the SDH exhibited 17% and 19% higher panicle number in a unit, 26% and 24% higher spikelet filling, 8% and 8% higher grain weight, 21% and 11% higher harvest index, and consequently 12% and 4% higher grain yield and 13 and 15 days shortened growth duration compared to the LDL in the 2018 and 2019, respectively. The data revealed that the difference in grain yield between the SDH and LDL was mainly due to the higher harvest index and reasonable dry matter distribution of the SDH, which was conducive to improving yield components and increasing rice grain yield. As a result, short-duration rice cultivars combined with dense planting is a feasible strategy for improving the grain yield of mechanically transplanted late rice

Multidisciplinary Digital Publishing Institute

Quantifying accumulation characteristics of glutelin and prolamin in rice grains.

Author: Alin Tian
Fangbo Cao
Guanghui Chen
Jiana Chen
Min Huang
Tao Lei
Yingbin Zou
Yu Liu
Zui Tao
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

Glutelin and prolamin are the two major proteins in rice grains. Grain content of glutelin is considerably higher than that of prolamin in rice, but there is limited information on the factors determining the different grain contents of glutelin and prolamin. To address this knowledge gap, the present study compared final weight per grain and accumulation characteristics of glutelin and prolamin in four rice cultivars. Results showed that final glutelin weight per grain was 3.24-3.95 times higher than final prolamin weight per grain. Glutelin and prolamin accumulation processes were well fitted by the logistic equation. The initial, maximum, and mean accumulation rates of glutelin were 1.69-4.67 times higher than those of prolamin. The active accumulation duration of glutelin was 2.9-5.1 d longer than that of prolamin. These results indicate that both higher accumulation rate and longer active accumulation duration are responsible for the higher final weight per grain of glutelin compared to prolamin in rice

Directory of Open Access Journals

Changes in Grain Yield and Yield Attributes Due to Cultivar Development in Indica Inbred Rice in China

Author: Fangbo Cao
Jiana Chen
Longsheng Liu
Min Huang
Ming Zhang
Ruichun Zhang
Zui Tao
Publication venue: 'MDPI AG'
Publication date: 18/10/2022
Field of study

Inbred rice has been grown more and more widely, while the planting area of hybrid rice has decreased by approximately 25% in China since 1995. This study aimed to assess the changes in grain yield and yield attributes due to cultivar development in indica (Oryza sativa ssp. indica) inbred rice in China. Field experiments were conducted in 2019 and 2020 to determine the performance of grain yield and yield attributes of an indica super inbred rice cultivar Jinnongsimiao (JNSM) released in 2010 by comparing it with an indica high-yielding inbred rice cultivar Guichao 2 (GC2) released in 1978 and an indica super hybrid rice cultivar Y-liangyou 900 (YLY900) released in 2016. Results showed that JNSM produced 18% higher grain yield than GC2 but 6% lower grain yield than YLY900. Compared with GC2, JNSM had higher spikelets per panicle, spikelet-filling percentage, and harvest index by 67%, 4%, and 11%, respectively. Compared with YLY900, JNSM had 14% lower grain weight and 19% lower biomass production during the pre-heading period. The difference in biomass production during the pre-heading period between JNSM and YLY900 was explained more by crop growth rate than growth duration. This study suggests that (1) the recently released indica super inbred rice cultivar JNSM outyields the old indica high-yielding inbred rice cultivar GC2 as a result of increasing panicle size, spikelet-filling percentage, and harvest index, and (2) further improvement in grain yield in indica inbred rice can be achieved by improving biomass production through promoting pre-heading crop growth

Multidisciplinary Digital Publishing Institute

Embedding Learning with Events in Heterogeneous Information Networks

Author: Brandon Norick
Fangbo Tao
Huan Gui
Jialu Liu
Jiawei Han
Lance Kaplan
Meng Jiang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Transcriptome Profiling of the Cancer, Adjacent Non-Tumor and Distant Normal Tissues from a Colorectal Cancer Patient by Deep Sequencing

Author: Antonio Moschetta
Fangbo Wu
Fangqin Xue
Guantao Liang
Min Tao
Pengwei Cai
Ruolei Huang
Xuetao Wang
Yan'an Wu
Yi Huang
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

Crossref